[PatentMT] Summary Report of Team III_CYUT_NTHU

نویسندگان

  • Joseph Z. Chang
  • Ho-Ching Yen
  • Shih-Ting Huang
  • Ming-Zhuan Jiang
  • Chung-Chi Huang
  • Jason S. Chang
  • Ping-Che Yang
چکیده

In this report paper, we investigate two issues facing phrase-based machine translation (MT) systems such as Moses (Koehn et al., 2007): out-of-vocabulary (OOV) words and singletons. MT systems typically ignore and directly output unknown or OOV source words into the target translation. On the other hand, for words which do not couple with their preceding or following words as phrases, as referred to as singletons, MT systems typically leave their translation disambiguation to language model within which knowledge is somewhat limited and determined by the preset length of words. In this paper, we first analyze the proportion of OOV words and singletons in translation task, summarize types of OOV words, and manually evaluate the impact of singletons on phrase-based MT systems. We also introduce methods for dealing with these two issues without changing the underlying phrase-based decoder.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SRI Submissions to Chinese-English PatentMT NTCIR10

The SRI team joined the subtask of Chinese-English Patent machine translation evaluation, and submitted the transla­ tion results using a combined output from two types of gram­ mars supported in SRlnterp, with two different word seg­ mentations. We investigated the effect of adding sparse fea­ tures, together with several optimization strategies. Also,for the PatentMT domain, we carried out pr...

متن کامل

SRI's Submissions to Chinese-English PatentMT NTCIR10 Evaluation

The SRI team joined the subtask of Chinese-English Patent machine translation evaluation, and submitted the translation results using a combined output from two types of grammars supported in SRInterp [13], with two different word segmentations. We investigated the effect of adding sparse features, together with several optimization strategies. Also,for the PatentMT domain, we carried out preli...

متن کامل

EBMT System of KYOTO Team in PatentMT Task at NTCIR-9

This paper describes“KYOTO”EBMT system that attended PatentMT task at NTCIR-9. When translating very different language pairs such as Japanese-English and ChineseEnglish, it is very important to handle sentences in tree structures to overcome the difference. Some works incorporate tree structures in some parts of whole translation process, but not all the way from model training (parallel sente...

متن کامل

Pancreas Transplantation and Report of 1st one in IRAN

SUMMARY Since 1923, the type I diabetic patients are treating with injections of insulin. Mortality of these patients decreased, comparing with noninsulin using patients, but many of them developed complications of diabetes mellitus, like nephropathy, retinopathy and neuropathy. The choice for treating this diseas and preventing its complications is pancrease transplantation, The 1st pancreas...

متن کامل

UQAM's System Description for the NTCIR-10 Japanese and English PatentMT Evaluation Tasks

This paper describes the development of a Japanese-English and English-Japanese translation system for the NTCIR-10 Patent MT tasks. The MT system is based on the provided training data and Moses decoder. We report our first attempt on statistical machine translation for these pairs of languages and the Patent domain.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011